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In this study I report results of an quantitative analysis using a large scale public dataset 
(HSLS:09) to investigate which students are taking statistics courses in high school to begin to 
understand the access students have to opportunities to learn statistics concepts and practices in 
high schools in the United States. The main result of this study is that predominantly the top 
academically performing high school students are earning credit for taking statistics. This is 
concerning as all students should have experiences learning concepts and practices from 
statistics to be prepared to engage in and play active roles in today’s data centric societies. In 
line with the conference theme of looking forward, implications for future research and policy 
are discussed. 
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Background and Statement of Problem 

Society today is drenched in data (Steen, 2001). Individuals are surrounded by data aimed at 
influencing their decisions about what school policies to implement, politician to vote for, or 
medicine to use. Statistics, the science of data, is becoming an increasingly important discipline 
for people to be familiar with the key concepts and practices of, because of society’s reliance on 
data (Ben-Zvi & Garfield, 2008). As Franklin et al. (2007) states, “every high school graduate 
should be able to use sound statistical reasoning to intelligently cope with the requirements of 
citizenship” (p.1). For students to develop sound statistical reasoning it is important that a goal of 
public K-12 education is to provide students with opportunities to have rich learning experiences 
with concepts and practices of statistics if they are going to be critical citizens in today’s data- 
driven societies. 

Teaching statistics in the K-12 setting is firmly rooted in the mathematics curriculum and 
access and opportunities to learn are crucial equity issues to consider in mathematics education 
(Gutiérrez, 2009; Schmidt & McKnight, 2012). In the United States, students’ opportunities to 
learn mathematics have been found to vary greatly due to the decentralized nature of education 
in the U.S. (Schmidt & McKnight, 2012). The implementation of the Common Core State 
Standards for Mathematics (CCSSM; National Governor’s Association Center for Best Practices 
(NGA Center) & Council of Chief State School Officers (CCSSO), 2010) was meant to serve as 
a unifying force to reduce some of the variation in the mathematics content covered across states 
in the U.S. However, not all states have signed on to adopt the standards, and other states have 
begun to modify the standards to be implemented in an attempt to distance themselves from 
some of the controversy and politics around the standards (Orrill, 2016). In the CCSSM, data 
analysis and statistics have gained emphasis in grades 6-12 compared to most previous state 
standards. However, there has also been the loss of much of the statistics and probability content 
at the K-5 level, which could have serious ramifications in the future (Lubienski, 2015). 

It is important to point out that standards are not the only factor influencing curriculum as 
classroom teachers and local contexts have a direct influence on the enacted curriculum students 
experience in the classroom (Remillard & Heck, 2014). However, this is a unique issue in the 
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case of statistics because although the instruction of statistics is firmly rooted in the mathematics 
curriculum at the K-12 level, statistics is a distinct discipline with concepts and practices that are 
non-mathematical (Cobb & Moore, 1997; Groth, 2013). This can cause problems because many 
K-12 mathematics teachers have often had little to no prior coursework in statistics (Franklin et 
al., 2015; Shaughnessy, 2007), which could have very serious repercussions for the opportunities 
students have to learn statistics in school. Since the enactment of mathematics curriculum varies 
greatly from classroom to classroom based on a number of factors (Remillard & Heck, 2014; 
Schmidt & McKnight, 2012), it is very difficult to study what opportunities students have to 
learn statistics concepts and practices on any kind of large scale. 

In this study, I used the National Center for Educational Statistics’ (NCES) High School 
Longitudinal Study of 2009 (HSLS:09) public access dataset to begin to investigate which 
students are earning statistics credit in high school in an effort to better understand high school 
students access to opportunities to learn statistics concepts and practices. More specifically, I 
investigated the following research questions: 


1. What relationship is there between demographic characteristics (1.e. sex, race, and SES) 
of students who earned at least one credit of statistics and those that did not? 

2. What relationship is there between the academic performance of students who earned at 
least one credit of statistics and those that did not? 

3. What relationship is there between the beliefs/attitudes of students who earned at least 
one credit of statistics and those that did not? 


Conceptual Framework 

This study is framed in a larger equity framework, specifically investigating issues around 
achievement and access, which are dimensions of what Rochelle Gutiérrez (2009) refers to as the 
dominant axis of equity. Gutiérrez (2002) states a basic definition of equity as being the, “erasure 
of the ability to predict students’ mathematics achievement and participation based solely on 
characteristics such as race, class, ethnicity, sex, beliefs and creeds, and proficiency in the 
dominant language” (p.153). Drawing upon this definition this study is focused on the 
characteristics of race, class, sex, and beliefs and whether or not there is a relationship between 
students’ statistics course taking and such characteristics, in an effort to begin to investigate the 
equity in student’s participation in opportunities to learn statistics concepts and practices. To 
consider the issue of achievement in this study, students’ mathematics achievement in terms of 
algebraic reasoning as well as their achievement in terms of their GPAs was considered in 
relation to whether or not they earned at least one credit in statistics or not. 


Mode of Inquiry 

Data source 

The public access dataset from the HSLS:09 was the data source for this study. The goal of 
the HSLS:09 is to provide data to “better understand the impact of earlier educational 
experiences (starting at 9th grade) on high school performance and the impact of these 
experiences on the transitions that students make from high school to adult roles” (Ingels et al., 
2015, p. 6). As such, the HSLS:09 was designed to gather data on a sample that is nationally 
representative of students entering 9" grade in 2009 (n=23503). One of the goals of the HSLS:09 
is to provide data to investigate, “the nature of the paths into and out of STEM (science, 
technology, engineering, and mathematics) curricula” (Ingels et al., 2015, p. 6). At this point data 
is available for the base year (Fall 2009), first follow-up (Spring 2012), post-secondary status 
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update (Summer/Fall 2013), and high school transcript report (2013-2014). 
Methods 

For this study, quantitative methods were employed to investigate variables related to 
students’ demographics, mathematics course taking, academic performance, and beliefs/attitudes 
towards math and science. Students’ mathematics course taking was considered in terms of 
whether or not they earned at least one credit in statistics/probability in high school and what was 
the highest level mathematics course students took in high school. As a note, AP Statistics course 
taking was not included in this analysis, because that data is only available in the restricted use 
data file. The HSLS:09 has a number of different academic performance variables, which include 
students’ performance on a mathematics assessment designed to assess students’ algebraic 
reasoning. In this analysis, I chose to use students’ theta score for their performance on the 
mathematics assessment, which provides a norm-referenced measurement of achievement, for 
the base year assessment and the first follow-up assessment. Students’ mathematics, STEM, and 
all academic course GPAs were also used as performance variables. Finally students’ identity, 
self-efficacy, utility of and interest beliefs/attitudes towards mathematics and science, which 
were measured as normalized scale variables during the base year and the first follow-up, were 
considered. A complete listing of the variables and their description that were used in the 
analysis can be found in Table 1. 

An initial exploratory data analysis (Tukey, 1977) was conducted and important descriptive 
statistics are reported in the results. An inferential analysis of the data was then done to address 
the research questions to investigate the relationship between students’ demographics, academic 
performance and beliefs/attitudes and whether they earned credit for taking a statistics course of 
not. The inferential statistics used included two-sample t-tests for scale variables, which included 
variables for students’ academic performance, beliefs/attitudes towards math and science, and 
SES, grouped based on whether or not they had at least one credit of statistics. In other words, 
the specific relationship that was investigated was whether or not there were differences in the 
variables considered, between the two groups. For categorical demographic variables, chi-square 
tests were used to determine if there were associations between the variables and the two groups 
considered. Standardized residuals were also employed in the case of statistically significant 
results of the chi-squared tests to get a better idea of which categories were significantly different 
from what was expected. Design effects normalized analytic weights were used for all inferential 
statistics. Effects sizes are reported for the quantitative variables by converting the t-test statistics 
to r and then using the commonly used cutoffs of r=.1 for small, r=.3 for medium, and r=.5 for 
large effect sizes (Field, 2013). Effect sizes for the categorical variable y?-test statistics are 
reported using Cramer V and the same cutoffs of .1, .3, and .5. 

The method of focusing on transcript data to investigate students’ course taking is not new. 
The transcript studies conducted periodically as part of the National Assessment of Educational 
Progress (NAEP) have been a source of such information over the past few decades. Based on 
such information, it was recently reported that, statistics/probability course taking in high school 
increased from 1% of high school graduates in 1990 to 11% in 2009 (NCES, 2016). 
Unfortunately, the NAEP transcript data is also limited in the information it collects on students, 
as it is focused mostly on transcript data of course taking and is only linked to students’ 
performance on the NAEP assessment in 12" grade in some cases. The HSLS:09 public access 
dataset includes significantly more variables related to students’ course taking, academic 
performance, demographics, and beliefs/attitudes towards mathematics and science, making it 
useful to investigate factors related to who the students are that are taking statistics courses. 
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Table 1: List of variable names and the descriptions from the HSLS:09 codebook (Ingels et 


al., 2015) for the variables analyzed in this study. 


Variable Name Description of Variable 
X3TICREDSTAT | Indicates at least one Carnegie unit in Statistics/Probability, does not include AP Statistics 
X3THIMATH Highest Mathematics course 
X3TWHENALGI | Indicates the grade level the student took Algebra I. 
XITXMTH & The mathematics theta score represents the student’s ability level on a continuous scale. The 
X2TXMTH theta score provides a norm-referenced measurement of achievement, that is, an estimate of 
achievement relative to the population (fall 2009 9th graders) as a whole. It provides 
information on status compared to peers. 
XIMTHID & This variable is a scale of the sample member’s math identity. Sample members who tend to 
X2MTHID agree with the statements “You see yourself as a math person” or “Others see me as a math 
person” will have higher values. 
XIMTHUTI & This variable is a scale of the sample member’s perception of the utility of mathematics; 
X2MTHUTI higher values represent perceptions of greater mathematics utility. 
X1IMTHEFF & This variable is a scale of the sample member’s math self-efficacy; higher values represent 
X2MTHEFF higher math self-efficacy. 
X1MTHINT This variable is a scale of the sample member’s interest in his or her base-year math course; 
higher values represent greater interest in the base-year math course. 
XISCID & This variable is a scale of the sample member’s science identity. Sample members who tend 
X2SCID to agree with the statements “You see yourself as a science person” or “Others see me as a 
science person” will have higher values for X2SCIID. 
XISCIUTI & This variable is a scale of the sample member’s perception of the utility of science; higher 
X2SCIUTI values represent perceptions of greater science utility. 
XISCIEFF & This variable is a scale of the sample member’s science self-efficacy; higher X2SCIEFF 
XISCIEFF values represent higher science self-efficacy. 
X1ISCIINT This variable is a scale of the sample member’s interest in his or her base-year science 
course; higher values represent greater interest in the base-year science course. 
X3TGPAMAT GPA in Mathematics. 
X3TGPASTEM _| GPA in STEM courses. 
X3TGPAACAD _| GPA in Academic courses. 
XISEX Student's sex 
X1IRACE Student's race/ethnicity-composite 
XISES Socio-economic status composite for base year 
X2SES Socio-economic status composite for first follow-up year 


Results 

Of the sample of students, there was not transcript data available for 1575 (6.7%) of the 
students. 19748 (84%) students did not earn any credit in statistics/probability, and 2180 (9.3%) 
students earned at least one credit in statistics/probability, which in looking back is slightly lower 
than the 11% of students reported in 2009 NAEP transcript study (NCES, 2016). 
Demographic Characteristics 

In considering the relationship between students’ demographic characteristics and whether or 
not they earned at least one credit in statistics, the characteristics of sex, race, and SES were 
investigated. In comparing sex to earning statistics credit, there was no significant association 
(7°=0.538, df=1, p=0.463). The association between race and earning statistics credit was 
statistically significant (y°=17.923, df=7, p=0.012). However, the effect size was small (Cramer 
V=.083). In looking at the standardized residuals of the observed and expected cell counts the 
only significant residual (those >+1.96) was for Asian students earning a credit in statistics with 
z=2.8, meaning there were significantly more Asian students who earned at least one credit in 
statistics than expected. A further caution is that three of the expected counts where less than 
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five, though that only constitutes 18.8% of the cells, which is below the 20% cutoff that is 
generally considered acceptable (Field, 2013). Finally, in considering the relationship between 
SES and earning statistics credit there were statistically significant differences between the 
groups in both the base and first follow-up years, the results of which can be seen in Table 2. The 
group differences in SES also had small effect size. 


Table 2: Difference in scale variables between students who did not earn any credits in 
statistics/probability versus those who earned at least one credit in statistics/probability. 


>1 Stat Cred | No Statistics Mean Diff | 95% CI of Diff | Effect Size 
Mean (SD Mean (SD t df SE r 
BY Math theta score | .563(.893) -.089(.937) |-10.51***| 2581 | -.652(.062) | [-.773, -.530] 203 
FY Math theta score | 1.295(1.102)} .538(1.123) |-10.28***| 2581 | -.758(.074) | [-.902, -.613] 198 
BY Math Identity .350(.970) | -.014(1.002) | -5.46*** | 2548 | -.364(.067) | [-.495, -.233] 108 
FY Math Identity .331(1.047) | -.003(.997) | -5.02*** | 2510] -.334(.066) | [-.464, -.203] .100 
BY Math Utility .018(.981) | -.001(1.001) -.270 | 2277] -.019(.070) | [-.155, .118] .006 
FY Math Utility .176(.979) .018(.991) -2.40* | 2503 | -.158(.066) | [-.287, -.029] .048 
BY Math SE .225(.919) | -.007(1.012) | -3.31** | 2269] -.231(.070) | [-.368, -.094] .069 
FY Math SE .180(1.014) .004(.996) -2.64** | 2477 | -.176(.067) | [-.307, -.045] 053 
Interest FO9 math .277(.992) -.014(.993) | -4.16*** | 2227 | -.290(.070) | [-.427, -.154] 088 
BY Science Identity .245(.978) .006(.993) | -3.62*** | 2542 | -.239(.066) | [-.369, -.109] .072 
FY Science Identity | .248(1.007) | -.008(.997) | -3.87*** | 2495 | -.257(.066) | [-.387, -.126] 077 
BY Science Utility .071(1.014) .004(.989) -.95 2088 | -.067(.071) | [-.207, .072] 021 
FY Science Utility .015(.071) -.001(.069) | -3.44** | 2487 | -.016(.005) | [-.025, -.007] .069 
BY Science SE .217(.991) -.006(.985) | -3.15** | 2079 | -.223(.071) | [-.362, -.084] .069 
FY Science SE .209(.968) .004(1.005) | -3.05** | 2437) -.205(.067) | [-.337, -.073] .062 
Interest FO9 science .127(.996) | -.014(1.009) -1.93 | 2040] -.141(.073) | [-.285, .002] 043 
GPA STEM courses | 2.889(.722) | 2.297(.930) | -9.76*** | 2576 | -.592(.061) | [-.711, -.473] 189 
GPA all courses 3.023(.684) | 2.456(.889) | -9.79*** | 2576 | -.567(.058) | [-.680, -.453] 189 
GPA Math 2.811(.810) | 2.218(.974) | -9.32*** | 2576 | -.594(.064) | [-.719, -.469] 181 
BY SES Composite .281(.780) -.109(.745) | -7.83*** | 2581 | -.390(.050) | [-.487, -.292] 152 
FY SES Composite .306(.717) | -.0750¢..715) | -7.85*** | 2394 | -.381¢0.049) | [-.476, -.286] 158 


Note. Equal variance assumed for independent t tests. *p<.05, **p<.01, ***p<.001. BY=Base Year, 
FY=First Follow-up Year. 


Beliefs/Attitudes 

Looking at the belief/attitude scale score variables (see Table 2); during the base year 
students who earned at least one credit in statistics/probability had significantly stronger 
mathematics and science identities, mathematics and science self-efficacy beliefs, and interest in 
their 9 grade mathematics course than students who did not. This pattern continued for the 
identity and self-efficacy variables in the first follow-up of the study. There was no difference in 
the mathematics or the science utility scale scores during the base year of the study. However, at 
the time of the first follow-up, students who earned at least one credit in statistics/probability had 
significantly higher mathematics and science utility scale scores than students who did not. It 
seems that over the course of three years of high school education the population of students who 
earned at least one credit in statistics/probability by the end of high school, were students who 
began to view mathematics as more useful on average than students who did not, whose average 
scale score changed minimally from the base year to the first follow-up year. The same cannot be 
said in the case of science where the scale score for both populations decreased on average from 
the base year to the first follow-up year. It is important to acknowledge that there is no way of 
knowing when students earned at least one credit in statistics, so no inferences should be made 
that temporal shifts in beliefs and attitudes might be influenced by statistics course taking. It is 
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also important to note that even though a number of the belief/attitude variables differed 
significantly between the groups, none of the differences had more than small effect sizes, and 
quite a few were below the r=.1 cutoff for small effect sizes. 
Academic Performance 

In looking at the mathematics course taking of those students who earned at least one credit 
in statistics/probability, a striking trend emerges that students taking statistics are largely the top 
mathematics students. Nearly half (n=1053) of the students earning statistics/probability credit in 
high school completed Algebra I in 8" grade, just over a quarter took statistics as their most 
advanced mathematics course in high school, and in fact over a quarter of students took calculus 
in high school (see Table 3). This trend is also supported by the mathematics performance of 
students who earned a statistics/probability credit versus those who did not (see Table 2). 
Students who earned at least one credit in statistics/probability had significantly higher theta 
scores on the algebraic reasoning assessment on average in both the base year and the first 
follow-up than students who did not earn a statistics/probability credit. This same trend can be 
seen in the GPA of students’ mathematics courses taken in high school, and in their GPA for 
STEM courses, and all academic courses (see Table 2). Furthermore, the differences had small to 
moderate effect sizes. These results seem to indicate that the students who earned at least one 
credit in statistics/probability may include more than just top performing mathematics students, 
but perhaps even the top performing students in general. 


Table 3: Frequency of students’ highest level mathematics course taken for students who 


earned at least one credit in statistics/probability 


X3 Highest level mathematics Frequency of students who earned at 


course taken/pipeline least one credit in statistics/probability 
No Math 0 
Basic math 0 
Other math 0 
Pre-algebra 0 
Algebra I 0 
Geometry 0 
Algebra II 0 
Trigonometry 0 
Other advanced math 0 
Probability and statistics 697 
Other AP/IB math 260 
Precalculus 596 
Calculus 144 
AP/TB Calculus 483 
Total 2180 


Discussion and Scholarly Significance 

Considering the results as a whole the two most significant factors that differed between the 
group of students who earned at least one credit in statistics and the group of students that did not 
were academic performance and SES. Though there are other factors that differed with statistical 
significance between the two groups, they were all of small or less effect sizes and given the 
large sample size should be considered with caution as only small differences are needed for 
statistical significance, which may not be significant in a practical sense. One possible reason, 
for predominantly top academically performing students earning at least one credit in statistics is 
that they have more time in their schedule to take additional mathematics courses like statistics. 
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Most states require at least three years of mathematics, which generally includes an Algebra I, 
Geometry, Algebra II trajectory or a more integrated Math I, Math II, Math III trajectory where 
the various strands of mathematics are taught together over three years. That means that students 
who take the typical 9" grade mathematics course of Algebra I or Math I in 8" grade have more 
time to take statistics as an elective course in their junior or senior year. 

The cohort of students in the HSLS:09 were in high school during the very beginning of the 
influence of the CCSSM (NGA Center & CCSSO, 2010), which does recommend that statistics 
concepts be included in all students’ high school mathematics education. However, the enacted 
curriculum that students experience is generally carried out by the classroom teacher who 
mediates the influence that official curriculum, such as the CCSSM, have on the enacted 
curriculum (Remillard & Heck, 2014) and as I discussed earlier, statistics has a unique position 
in this regard as many mathematics teachers have often had little to no prior experience with 
learning statistics (Shaughnessy, 2007). Though the results here do not show that all students do 
not have access to opportunities to learn statistics, they do point out that there may, at the very 
least, be an issue in how statistics courses are offered or advertised to students, or perhaps in 
students’ perceptions of the usefulness of statistics, or who should take it. The results do show 
that the majority of students are not earning any credits in statistics/probability, which means if 
they are not experiencing statistics in their other mathematics courses, they are not having any 
experiences with statistics in high school. It is important that more research is done around 
students’ opportunities to learn statistics at the K-12 level especially in the context of the enacted 
curriculum that students experience and how students are provided access to opportunities to 
learn statistics at the school level. 

A limitation in this analysis is that using the HSLS:09 data it is only possible to determine 
which students earned at least one credit of statistics/probability in high school. It does not 
provide information on specific course performance, when the student took the course, or 
identify those students who were enrolled in a statistics/probability course but did not earn credit, 
which limits the results reported. Furthermore, it is not possible to determine the quality of the 
students’ instruction or whether or not they had any opportunities to learn statistics in any of 
their other courses. However, given the current dearth of empirical research looking at the 
teaching or learning of statistics on any kind of large scale, these results help to give some idea 
of patterns in statistics course taking, which have implications for further research and for policy. 
Another limitation is not having transcript data on who earned credit for AP Statistics. However, 
given that AP courses are advanced courses the inclusion of such data would have likely only 
amplified the results as the students often taking AP courses are generally high performing and 
of a higher SES. However, it is still important that future analyses include such data. 

Related to the equity framework used in this study, the results reported here are significant in 
that if the goal of education is democratic equality, preparing students to participate as critical 
citizen’s in society where statistical reasoning is crucial, than there is a serious issue in that it 
appears that predominantly only the top performing students in high school are earning credits in 
statistics/probability. It is promising that students’ sex was found to have no association with 
their taking of statistics. However, race does appear to have a weak association and there are 
significant differences in the academic performance and SES of those who earned credit in 
statistics and those who did not, which points to inequity in students’ access to statistics. Going 
back to Gutiérrez’s (2009) equity framework, it is also important to point out that using this data, 
equity can only be considered through the dominant axis, which means there is still the issue of 
the critical axis of equity, namely identity and power, to consider in future work. Though the 
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variables of math and science identity were considered in the HSLS:09, they are different than 
the identity construct that Gutiérrez (2009) discusses. The identity variables in the HSLS:09 
consider an individual’s identity relative to its alignment with the discipline, whereas Gutiérrez 
(2009) is looking at identity from the perspective of the individual and how they see themselves 
in relation to the discipline and curriculum. It is crucial, that if we are to achieve the promise of 
equal educational opportunity, that all students should have experiences learning statistics 
concepts and practices, and that they have experiences in seeing themselves in the curriculum 
and considering how to read and write the world with statistics (Weiland, 2017). 
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